Uniication-based Glossing

نویسندگان

  • Vasileios Hatzivassiloglou
  • Kevin Knight
چکیده

We present an approach to syntax-based machine translation that combines uniication-style interpretation with statistical processing. This approach enables us to translate any Japanese newspaper article into English, with quality far better than a word-for-word translation. Novel ideas include the use of feature structures to encode word lattices and the use of uniication to compose and manipulate lattices. Uniication also allows us to specify abstract features that delay target-language synthesis until enough source-language information is assembled. Our statistical component enables us to search eeciently among competing translations and locate those with high English uency. 1 Background JAPANGLOSS Knight et al., 1994; 1995] is a project whose goals are to scale up knowledge-based machine translation (KBMT) techniques to handle Japanese-English newspaper MT, to achieve higher quality output than is currently available, and to develop techniques for rapidly constructing MT systems. We built the rst version of JAPANGLOSS in nine months and recently participated in an ARPA evaluation of MT quality White and O'Connell, 1994]. JAPANGLOSS is an eeort within the larger PANGLOSS NMSU/CRL et al., 1995] MT project. Our approach is to use a KBMT framework, but to fall back on statistical methods when knowledge gaps arise (as they inevitably will). We syntactically analyze Japanese text, map it to a semantic representation, then generate English. Figure 1 shows a sample translation. Parsing is bottom-up, driven by an augmented context-free grammar whose format is roughly like that of Shieber, 1986]. Our grammar rules look like this: OUTPUT: The new company plans to establish in February. The semantic representation contains conceptual tokens drawn from the 70,000-term SENSUS ontology Knight and Luk, 1994]. Semantic analysis proceeds as a bottom-up walk of the parse tree, in the style of Mon-tague and Moore Dowty et al., 1981; Moore, 1989]. Semantics is compositional, with each parse tree node assigned a meaning based on the meanings of its children. Leaf node meanings are retrieved from a semantic lexicon , while meaning composition rules handle internal nodes. Semantic rules and lexical entries are sensitive to syntactic structure,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unification-Based Glossing

We present an approach to syntax-based machine translation that combines uniication-style interpretation with statistical processing. This approach enables us to translate any Japanese newspaper article into English, with quality far better than a word-for-word translation. Novel ideas include the use of feature structures to encode word lattices and the use of uniication to compose and manipul...

متن کامل

The Ubiquity of the Gloss

This paper argues that glossing is an essential stage in the borrowing of writing systems. I use the term “glossing” in a somewhat extended sense to refer to a process where a text in one language is prepared (annotated, marked) to be read in another. I argue that this process of “vernacular reading” – reading a text written in the script, orthography, lexicon and grammar of a more prestigious ...

متن کامل

The Effects of Oral Code-mixing and Glossing on Iranian EFL Learners' Vocabulary Knowledge

The current study investigated the effects of oral code-mixing and glossing on L2 vocabulary learning. To this end, 60 EFL learners studying at pre-university school were given a pre-test to make sure that they did not have any prior knowledge of the target words. Based on their scores in the pre-test, 36 pre-university students were selected and divided into three groups, including two experim...

متن کامل

Automatic interlinear glossing as two-level sequence classification

Interlinear glossing is a type of annotation of morphosyntactic categories and crosslinguistic lexical correspondences that allows linguists to analyse sentences in languages that they do not necessarily speak. Automatising this annotation is necessary in order to provide glossed corpora big enough to be used for quantitative studies. In this paper, we present experiments on the automatic gloss...

متن کامل

05John Whitman_OK.indd

This paper argues that glossing is an essential stage in the borrowing of writing systems. I use the term “glossing” in a somewhat extended sense to refer to a process where a text in one language is prepared (annotated, marked) to be read in another. I argue that this process of “vernacular reading” – reading a text written in the script, orthography, lexicon and grammar of a more prestigious ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995